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A  Software  Package  for  Unconstrained  Optimization  using  Tensor  Methods 


Abstract 


This  paper  describes  a  software  package  for  finding  the  unconstrained  minimizer  of  a  nonlinear 
function  of  n  variab'es.  Tb^  package  is  intended  for  problems  where  n  is  not  too  large,  say  n  <  100,  so 
that  the  cost  of  storing  one  n  x  n  matrix,  and  factoring  it  at  each  iteration,  is  acceptable.  The  software 
allows  the  user  to  choose  between  a  recently  developed  "tensor  method"  for  unconstrained  optimization, 
and  an  analogous  standard  method  based  upon  a  quadratic  model.  The  tensor  method  bases  each  iteration 
upon  a  specially  constructed  fourth  order  model  of  the  objective  function  that  is  not  significantly  more 
expensive  to  form,  store,  or  solve  than  the  standard  quadratic  model.  In  our  experience,  the  tensor  method 
requires  significantly  fewer  iterations  and  function  evaluations  to  solve  most  unconstrained  optimization 
problems  than  standard  methods  based  upon  quadratic  models,  and  also  solves  a  somewhat  wider  range  of 
problems.  For  these  reasons,  it  may  be  a  useful  addition  to  numerical  software  libraries. 
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I.  Introduction 

This  paper  describes  a  software  package  that  implements  the  tensor  method  for  unconstrained 
optimization  introduced  in  Schnabel  and  Chow  [1989],  The  package  solves  the  unconstrained  minimiza¬ 
tion  problem 

min  f  (x) :  /?"—»!?  , 

xeR" 

utilizing  either  analytic  or  finite  difference  gradients  and  Hessian  matrices  at  each  iteration.  The  software 
allows  the  user  to  choose  between  a  tensor  method  and  an  analogous  standard  method  based  upon  a  qua¬ 
dratic  model.  It  is  intended  for  problems  where  the  number  of  variables  n  is  not  too  large,  say  n<100,  so 
that  the  cost  of  storing  one  nxn  matrix  and  factoring  it  at  each  iteration  is  acceptable.  It  can  be  applied  to 
problems  where  «  =  1,  but  software  designed  specifically  for  one  variable  problems  is  likely  to  be  prefer¬ 
able. 

The  tensor  method  employed  in  the  software  bases  each  iteration  upon  a  specially  constructed 
fourth  order  model  of  the  objective  function.  This  model  interpolates  the  function  value  and  gradient  from 
the  previous  iterate,  as  well  as  the  current  function  value,  gradient,  and  Hessian  matrix.  The  model  is  con¬ 
structed  so  that  the  storage  it  requires,  and  the  arithmetic  operations  per  iteration  required  to  form  and 
solve  it,  arc  not  significantly  higher  than  for  a  standard  second  derivative  method  based  upon  a  quadratic 
model.  In  our  experience,  the  tensor  method  requires  significantly  fewer  iterations  and  function  evalua¬ 
tions  to  solve  most  unconstrained  optimization  problems  than  standard  unconstrained  optimization 
methods  based  upon  quadratic  models,  and  also  solves  a  somewhat  wider  range  of  problems.  For  these 
■  w -sons,  't  may  be  a  useful  addition  to  numerical  software  libraries. 

The  next  section  gives  a  very  brief  overview  of  the  tensor  method;  for  further  information,  see 
Schnabel  and  Chow  [1989].  Section  3  gives  an  overview  of  the  input,  output,  and  important  options  pro¬ 
vided  by  the  software.  In  Section  4  we  describe  the  user  interface  to  the  package,  which  includes  both  a 
simplified  (default!  and  a  longer  calling  sequence.  Section  5  describes  in  detail  all  the  possible  input  and 
output  parameters  for  the  package,  and  their  effects.  Section  6  mentions  a  few  implementation  details.  In 
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Sectic n  7  vvc  summarize  our  experience  using  this  package,  which  can  be  found  in  more  detail  in  Chow 
[1989], 

2.  Brief  Overview  of  Tensor  Method 

Each  iteration  of  the  tensor  method  is  based  upon  a  fourth  order  model  of  the  objective  function 
/  (x )  around  the  current  iterate  xc  that  has  the  form 

m(xc+d)  =  f(xc)  +  Vf(xc)Td+V2dTV2f(xc)d  +  ^Tcdi  +  ^-VcdA  .  (2.1) 

The  tensor  terms  Tc  and  Vc  are  three  and  four  dimensional  objects,  respectively,  that  are  chosen  so  that 
the  model  interpolates  function  and  gradient  information  from  previous  iterates.  Schnabel  and  Chow 
[1989]  allowed  the  mode!  to  interpolate  function  and  gradient  values  from  more  than  one  past  iterate,  but 
their  tests  showed  that  there  was  no  advantage  to  this  in  comparison  to  a  tensor  method  that  interpolates 
only  the  function  value  and  gradient  from  the  most  recent  past  iterate.  In  addition,  the  tensor  method  is 
significantly  simpler  when  only  information  from  one  past  iterate  is  used.  For  these  reasons,  this  software 
package  uses  the  version  of  the  tensor  method  that  only  interpolates  information  from  the  most  recent 
previous  iterate. 

Schnabel  and  Chow  [1989]  derive  a  procedure  for  determining  the  smallest  Tc  and  Vc,  in  the  Fro- 
benius  norm,  that  allow  (2.1)  to  interpolate  previous  function  and  gradient  values.  In  the  case  when  only 
one  past  iterate  is  used,  they  show  that  these  Tc  and  Vc  arc  rank-two  and  rank-one  tensors,  respectively. 
In  particular,  the  tensor  model  has  the  form 

m(xe+d)  =  f  (xc )  +  V/  (xc)T d  +  VidTV2f{xc)d  +  ^{sjdc)\bj de)  +  ^(sjdc  )4  (2.2) 

v.h.ra  „vc  is  the  step  from  xc  to  the  previous  iterate  xp ,  and  bceRn  and  aeR  arc  uniquely  determined  by 
the  ii'qeirar,.c::ts  {xp)  and  Vm(xp)=V/ (xp).  The  cost  of  finding  bc  and  a  is  nr  (for  the  calcula¬ 

tion  ,vj VV (xc)sc)  +  0(n )  multiplications  and  additions,  which  is  very  small  in  comparison  to  the  basic 
0(n})  per  iteration  cost  of  the  standard  method.  The  additional  storage  required  is  just  a  few  n -vectors, 
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for  t>c,  sc,  and  some  intermediate  quantities,  which  is  very  small  compared  to  the  n2/ 2  storage  for 

V2/  (xc). 

Near  a  minimizer,  the  step  taken  by  the  tensor  method  is  to  the  minimizer  of  (2.2).  Schnabel  ana 
Chow  [1989]  show  that  the  problem  of  minimizing  the  fourth  order  model  (2.2)  can  be  reduced  to  the 
problem  of  minimizing  a  fourth  order  polynomial  in  one  variable,  plus  the  minimization  of  a  quadratic 
function  in  n- 1  variables.  The  cost  of  this  process  is  only  4n2  +  O(n)  multiplications  and  additions  more 
than  the  basic  O  (n3)  of  minimizing  a  quadratic  model. 

As  in  all  noniinear  optimization  methods,  it  is  also  necessary  to  incorporate  a  procedure  that  allows 
the  method  to  converge  from  starting  points  that  are  far  from  the  solution.  Chow  [1989]  implemented 
both  a  line  search  and  a  trust  region  methods  for  the  tensor  method.  In  his  tests,  the  tensor  method  with  a 
trust  region  strategy  was  about  15%  more  efficient  (in  tenns  of  iterations  or  function  and  derivative 
evaluations)  than  the  tensor  method  with  a  line  search  strategy,  while  the  difference  in  robustness 
between  the  two  methods  was  indistinguishable.  The  line  search  tensor  method  was  still  about  25-35% 
more  efficient  than  either  a  standard  line  search  or  trust  region  method  based  upon  the  quadratic  model. 
In  addition,  the  line  search  tensor  method  is  much  simpler  to  implement  and  to  understand  than  the  trust 
region  tensor  method,  and  is  appreciably  faster  on  small,  inexpensive  problems  where  the  complexity  of 
the  code  becomes  the  dominant  cost.  For  these  reasons,  this  software  uses  a  line  search  method. 

In  the  line  search  tensor  method,  if  the  tensor  model  has  a  minimizer  and  it  is  in  a  descent  direction 
from  xc ,  then  we  search  along  the  direction  to  the  minimizer,  using  the  backtracking  line  search  algorithm 
A6.3. 1  from  Dennis  and  Schnabel  [1983].  This  line  search  starts  with  stcplength  one,  and  if  the  candidate 
point  <+  docs  not  satisfy  the  condition 

fix,)  <  f  (xc)  +  1 0-4  V/  (xc  )T  ix  +-xc )  ,  (2.3) 

it  reduces  this  s’^r'^ngth  by  some  factor  between  0.1  and  0.5,  and  continues  to  do  this  until  the  first  time 
that  the  candidate  point  satisfies  (2.3).  If  wc  cannot  use  the  tensor  model  for  the  reasons  mentioned 
above,  or  we  arc  at  the  first  iteration  and  there  is  no  previous  iterate,  wc  base  the  step  upon  the  standard 
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quadratic  model.  We  use  the  same  line  search,  and  calculate  the  search  direction  using  the  modified 
Cholcsky  factorization  of  Schnabel  and  Eskow  [1990],  The  search  direction  is 
— (V~y (xc )+ 1  |D  |  !/)~'V/(;tc),  where  D  is  the  diagonal  matrix  added  to  the  Hessian  in  the  modified 
Cholcsky  decomposition,  which  results  in  D  =  0  if  V2/  (X )  is  safely  positive  definite. 

When  we  can  use  the  tensor  model  to  find  a  potential  next  iterate,  it  turns  out  that  we  can  find  the 
search  direction  based  upon  the  quadratic  model  at  little  additional  cost.  For  this  reason,  we  always  also 
calculate  the  potential  next  iterate  based  upon  the  quadratic  model,  and  then  choose  the  point  with  the 
lower  function  value  as  our  next  iterate.  This  strategy  generally  only  costs  1  or  2  additional  function 
evaluations  (and  no  extra  derivative  evaluations)  per  iteration,  and  has  been  found  to  appreciably  improve 
the  efficiency  of  the  algorithm. 

A  high  level  description  of  the  tensor  method  is  contained  in  Algorithm  2.1. 


Algorithm  2.1  --  An  Iteration  of  the  Tensor  Method  Algorithm 

Given  current  iterate  xc,f  (xc),  previous  iterate  xp ,  /  (xp ),  V/  (xp)\ 

1.  Calculate  Vf  (xc)  and  decide  whether  to  stop.  If  not: 

2.  Calculate  V2/  ( xc ). 

3.  Set  sc  =  xc-xp ,  and  calculate  bc  and  a  in  the  tensor  model  (2.2)  so  the  the  tensor  model  interpolates 

/  (xp )  and  V/  (xp). 

4.  Calculate  the  the  minimizer  of  the  tensor  model;  if  it  has  one,  and  the  minimizer  is  in  a  descent  direc¬ 

tion  dj  from  xc ,  then  use  the  line  search  to  calculate  a  potential  acceptable  next  iterate  x+t  = 
xc  +\j  dj . 

5.  Calculate  the  search  direction  d +lv  based  upon  the  quadratic  model,  as  described  above  (d+;v  = 

-( V2/ (xc )+ 1  | D  |  I /)“’ V/ (xc),  D  >0),  and  use  the  line  search  to  calculate  a  potential  acceptable 
next  iterate  x+,v  =  xc  +A.,v  dy . 

6.  If  (the  tensor  line  search  was  conducted)  and  (f  ( x+r)-f  ( x+y )) 

then  set  x+=x+j 
else  set  x+=x+y. 

7.  Set  xp  =  xc,  xc  =.r+. 


-5- 


3.  Overview  of  the  Software  Package 

The  required  input  to  the  software  is  the  number  of  variables  n  (N),  a  subroutine  to  evaluate  the 
function  /  ( x )  and  its  name  (FCN),  an  initial  approximation  xq  to  the  solution  x .  (X),  and  the  row  dimen¬ 
sion  of  the  matrix  in  the  users  program  that  will  contain  the  Hessian  matrix  'NR).  The  user  may  also  pro¬ 
vide  subroutines  to  evaluate  V/(x)  and  V2/  (x),  but  these  subroutines  are  optional.  If  they  arc  not  pro¬ 
vided,  the  gradient  and  Hessian  are  approximated  by  finite  differences.  If  subroutines  to  calculate  ana¬ 
lytic  derivatives  arc  provided,  they  are  checked  at  the  start  of  the  algorithm  against  a  finite  difference 
approximation  to  detect  possible  coding  errors,  unless  the  user  chooses  to  disable  this  option  (see 
GRDFLG,  HESFLG,  Sec.  5). 

Upon  completion,  the  software  produces  its  final  iterate  x/  (XPLS),  the  value  of  the  function  f  (Xf) 
(FPLS),  the  gradient  gUf)  (GPLS),  the  Hessian  H(xf)  (H),  and  a  flag  specifying  under  which  stopping 
condition  the  algorithm  was  terminated  (MSG).  The  stopping  criteria  used  in  this  software  are  the  same 
as  in  the  UNCMIN  package  of  Schnabel,  Koontz  and  Weiss  [1985].  Informally,  they  arc:  (1)  V/  (*+)=0; 
(2)  x+~xc ;  (3)  the  package  could  not  satisfy  (2.3)  at  the  last  iteration;  (4)  the  iteration  limit  v.  as  exceeded; 
and  (5)  divergence  is  suspected.  If  any  of  these  conditions  is  satisfied,  the  algorithm  terminates.  In  our 
experience,  when  the  code  stops  due  to  V/ (*+)=(),  it  is  almost  always  near  a  local  minimizes  When  it 
stops  because  x+~xc  it  is  usually  near  a  solution;  this  tolerance  should  be  set  quite  small,  however,  since 
optimization  algorithms  sometimes  take  very  small  steps  while  still  far  from  the  solution.  When  the  algo¬ 
rithm  stops  because  the  last  iteration  could  not  satisfy  (2.2),  it  is  often  near  a  solution  and  cannot  achieve 
further  accuracy  due  to  finite  precision  arithmetic;  this  can  be  assessed  by  checking  the  size  of  the  gra¬ 
dient.  The  divergence  test  is  meant  to  detect  functions  that  arc  unbounded  below;  a  very  large  maximum 
step  size  is  imposed  in  the  line  search,  and  if  five  consecutive  steps  of  ill  is  size  are  taken,  divergence  is 
suspected.  A  more  precise  definition  of  these  criteria  and  recommended  values  for  the  applicable  toler¬ 
ances  (GRADTL,  STEPTL,  ITNLIM,  STEPMX)  are  given  in  Sec.  5. 
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The  user  has  the  option  to  choose  between  the  tensor  method  and  a  standard,  quadratic  model  based 
method.  In  our  experience,  the  tensor  method  usually  requires  fewer  iterations  and  function  and  derivative 
evaluations  than  the  standard  method,  but  this  may  vary  depending  upon  the  problem.  The  choice  is 
based  upon  the  input  parameter  METHOD;  the  default  choice  is  the  tensor  method.  As  previously  men¬ 
tioned,  for  both  the  tensor  and  standard  methods,  a  line  search  strategy  is  used. 


The  software  can  perform  scaling  of  the  variable  space.  If  the  user  inputs  a  typical  magnitude  cypxt 
of  each  component  of  then  the  performance  of  the  package  is  equivalent  to  what  would  result  from 
redefining  the  independent  variable  jc  in  the  user’s  function  with 


-*•  scaled 


Mcypxi 


■  x 


MtypXn 


and  then  ninning  the  package  without  scaling.  In  our  experience,  scaling  is  often  beneficial  when  dif¬ 
ferent  components  of  x  are  expected  to  have  widely  differing  magnitudes,  i.e.  differing  by  several  orders 
of  magnitude,  and  may  sometimes  be  necessary  in  order  for  the  software  to  successfully  solve  such  prob¬ 
lems.  The  default  is  no  scaling,  i.e.  each  typx-,  -  1.  Scaling  is  controlled  by  the  parameter  TYPX. 


The  user  can  have  the  software  print  out  information  at  each  iteration,  print  out  only  the  initial 
iterate  and  the  final  result,  or  not  print  out  anything.  When  MSG= 0,  the  software  will  not  print  out  any¬ 
thing;  this  is  often  useful  when  the  software  is  imbedded  in  other  software.  When  MSG=\  (the  default), 
the  software  will  print  out  the  initial  iterate  xq,  f  (x o),  and  V/(x0).  the  final  iterate  xj ,  f  (x/),  and 
Vf  (xj ),  the  reason  the  algorithm  was  terminated,  and  the  number  of  iterations  taken.  When  MSG  =2,  the 
software  prints  out  this  information  and  in  addition,  the  values  oixc,f ( xc ),  and  V/(xc)  at  each  iteration. 


The  user  can  input  an  estimate  of  the  number  of  accurate  digits  in  the  objective  function  f  (x), 
using  the  parameter  NDIGIT.  It  is  important  to  provide  this  information  whenever  the  number  of  accu¬ 
rate  digits  in  /  (x )  is  expected  to  be  significantly  fewer  than  the  full  double  precision  used  by  this  pack¬ 
age.  This  may  occur,  for  example,  when  f  (x)  is  itself  the  result  of  an  iterative  procedure  which  returns  an 
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answer  where  only  some  number  of  the  leading  digits  are  accurate,  or  if  the  routine  for/ (x)  is  written  in 
single  precision.  It  is  particularly  important  to  supply  this  information  when  the  function  evaluation  docs 
not  have  (nearly)  full  double  precision  accuracy  and  finite  difference  derivatives,  either  gradients  or  Hes¬ 
sians,  are  being  used. 


4.  Calling  the  Software  Package 

There  arc  two  ways  to  call  the  package.  If  the  user  wishes  to  override  the  default  values  of  any 
input  parameters,  or  to  supply  routines  to  evaluate  the  gradient  or  Hessian,  then  the  following  sequence  is 
used  : 

CALL  D FAULT (N  .TYPX  f  SCALE  ,CRADTL  STEPTL  JTNUM  .STEPMX  JPR  METHOD  . 

G  RDF  LG  .HESFLG  M DIGIT  MSG ) 

(code  to  override  specific  default  parameters  goes  here} 

CALL  TENSOR  (NR  M  X  fCN  ,GRD  MSN  JYPX  .FSCALE  .GRADTL  STEPTL  JTNUM  STEPMX , 
l PR  METHOD  .GRDFLG  ,HESFLG  M DIGIT  MSG  XPLS  .FPLS  ,GPLS  JL  JTNNO  .1 VRK  JWRK ) 
The  routine  DFAULT  sets  all  input  parameters  to  their  default  values,  so  that  the  user  only  needs  to 
specify  those  values  that  are  desired  to  have  different  values  than  the  defaults.  For  example,  if  the  user 
wishes  to  use  all  the  default  values  except  the  iteration  limit  (setting  it  instead  to  300),  and  wishes  to  sup¬ 
ply  analytic  gradients,  then  the  calling  sequence  would  be 

CALL  DFAULT (N  JYPX  f  SCALE  .GRADTL  STEPTL  JTNUM  STEPMX  JPR  METHOD  , 

GRDFLG  .HESFLG  M  DIG  IT  MSG ) 

ITNUM  =  300 
GRDFLG  =  \ 

CALL  TENSOR  (NR  X  X  fCN  ,GRD  MSN  JYPX  .FSCALE . GRADTL  .STEPTL  JTNUM  .STEPMX  . 

I  PR  METHOD  . GRDFLG  .HESFLG  N  DIGIT  MSG  XPLS . FPLS  ,GPLS  M  JTNNO  ,WRK  JWRK ) 
The  name  of  the  routine  for  evaluating  analytic  gradients  would  be  given  where  the  parameter  GRD  is 
shown,  and  this  routine  would  be  supplied  by  the  user.  In  addition,  the  values  of  NR  (the  row  dimension 
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of  user's  matrix  that  will  contain  the  Hessian),  N  and  X  would  be  supplied  by  the  user,  and  the  user 
would  provide  a  routine  to  evaluate  the  objective  function  f  (x)  and  supply  its  name  in  FCN .  WRK  and 
IWRK  arc  an  ,\7?xS  double  precision  array  and  an  NR  element  integer  vector,  respectively,  that  arc  used 
as  work  arrays  by  the  package.  These  arrays  must  be  declared  in  the  user’s  calling  program,  but  may  be 
given  different  names  than  WRK  and  IWRK . 

If  die  user  wishes  to  use  all  the  default  values  of  the  parameters  and  evaluate  derivatives  by  finite 
differences,  then  there  is  a  simpler  way  to  call  the  package.  It  is 

CALL  TEXSRD  (NR  jN  X  JFCN  MSC  XPLS  JFPLS  ,GPLS  ,H  JTNNO  ,WRK  JWRK ) 

( TENSRD  stands  for  Tensor  Default).  TENSRD  simply  calls  D FAULT  followed  by  TENSOR  .  The  user 
must  still  supply  values  of  NR,  N,  X,  the  routine  to  evaluate  f  (x)  and  its  name  in  FCN ,  and  the  work 
arrays  WRK  and  IWRK . 

5.  Parameters  and  Default  Values 

In  this  section  we  describe  the  parameters  for  the  software  package.  In  the  parameter  list,  die  sym¬ 
bol  — <—  or  * - >  follows  each  parameter.  These  symbols  specify  that  the  parameter  is  an  input,  output 

and  input-output  parameter,  respectively.  Most  of  the  input  parameters  do  not  have  to  be  supplied  by  the 
user  (see  Sec.  4  and  below);  if  they  arc  not  specified,  the  code  gives  diem  the  default  value  that  is 
specified  below. 

NR— >  :  A  positive  integer  specifying  the  row  dimension  of  die  matrices  H  and  WRK  in  the  user's  calling 
program.  (H  is  used  to  store  the  Hessian  matrix,  WRK  is  used  for  workspace.)  NR  must  be  greater  than 
or  equal  to  N .  This  provision  allows  the  user  to  solve  several  problems  with  different  values  of  N  while 
using  the  same  user  storage.  If  NR  <N ,  the  software  will  set  NR  =N  and  print  a  warning  message. 

N— *  :  A  positive  integer  specifying  the  number  of  variables  in  the  objective  function.  The  value  of  .V 
must  be  less  than  or  equal  to  the  value  of  NR .  If  N  <0,  the  program  will  abort.  If  /V  =  l,  die  program  will 
print  a  warning  message,  unless  MSC  is  set  to  0. 
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X— >  :  An  N  -vector  containing  the  initial  approximation  to  the  solution  x. . 

FCN— >  :  The  name  of  a  user  supplied  subroutine  that  returns  in  F  the  value  of  the  objective  function 
f  (x)  at  the  current  point  X .  FCN  must  be  declared  EXTERNAL  in  the  user’s  calling  piogum  and  must 
conform  to  the  usage 

CALL  FCN  (N  ,X  ,F), 

where  N  is  the  dimension  of  the  problem,  X  is  the  current  point,  and  F  is  the  function  value  at  the  current 
point.  FCN  must  not  alter  the  values  of  N  and  X . 

GRD— »  (Optional)  :  The  name  of  a  user  supplied  subroutine  that  returns  in  G  the  value  of  the  gradient 
V/u)  at  the  current  point  X .  GRD  must  be  declared  EXTERNAL  in  the  user’s  calling  program  and  must 
conform  to  the  usage 

CALL  GRD  (N  ,X  ,G), 

where  N  is  the  dimension  of  the  problem,  X  is  the  current  point,  and  C  is  the  gradient  at  the  current 
point.  GRD  must  not  alter  the  values  of  N  and  X . 

HSN— >  (Optional)  :  The  name  of  a  user  supplied  subroutine  that  returns  in  H  the  value  of  the  Hessian 
V2/  (x )  at  the  current  point  X .  HSN  must  be  declared  EXTERNAL  in  the  user’s  calling  program  and  must 
conform  to  the  usage 

CALL  HSN  (NR,  N  X ,  H ), 

where  NR  is  the  row  dimension  of  H  in  the  users  program,  N  is  the  dimension  of  the  problem,  X  is  the 
current  point,  and  H  is  the  Hessian  at  the  current  point.  HSN  must  not  alter  the  values  of  NR  ,  N ,  or  X . 

TYPX— >  (Optional)  :  An  N-vcctor  containing  the  scaling  vector.  The  default  value  is  TYPX  =(1,1 . 1 ).  If 

the  user  supplies  values  for  TYPX  ,  then  TYPX  [/  ]  should  be  the  absolute  value  of  the  estimated  magnitude 
of  ,r,  at  the  solut'^n  and/or  during  the  .r  lution  process.  Fo'  example,  if  it  is  anticipated  that  the  range  of 
values  for  the  iterates  will  be 
x,  g  [-1010,  10ln  J 


J 
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*;£  [-102,  104] 

x3  6  [-6x10-6, 9X10-6  ] 

then  an  appropriate  choice  would  be  TYPX  =  [  1010,  103,7xl0-6].  In  cases  like  this  where  the  magni¬ 
tudes  of  the  components  of  X  differ  substantially,  it  may  be  necessary  to  supply  scaling  information  in 
order  for  tfj  software  to  be  successful  and  efficient.  If  a  negative  value  is  specified  for  TYPX  [I],  its 
absolute  value  is  used,  while  if  0  is  specified,  1  is  used. 


FSC ALE— »  (Optional)  :  A  positive  real  number  estimating  the  magnitude  of  /  (x)  near  the  solution  x . . 
FSCALE  is  used  in  the  gradient  stopping  condition  given  below.  The  default  value  is  FSCALE = 1.  It  may 
be  helpful  to  specify  FSCALE  when  the  units  of/ (x)  cause  it  always  to  be  many  orders  of  magnitude 
different  from  1.  If  /  (jco)  is  much  greater  than  /  (x> ),  FSCALE  should  approximate  /(*.).  not/  (xo).  If  a 
negative  value  is  specified  for  FSCALE,  its  absolute  value  is  used,  while  ifO  is  specified,  1  is  used. 


GRADTL— >  (Optional)  :  A  positive  real  number  giving  the  tolerance  at  which  the  scaled  gradient  is  con¬ 
sidered  close  enough  to  zero  to  terminate  the  algorithm.  The  scaled  gradient  is  a  measure  of  the  relative 
change  in  /  (x )  in  the  direction  xt .  The  gradient  stopping  test  used  in  the  software  is 


max 

l 


I  V/OO,-  •  1  max(  U,-  |  ,TYPX[l]} 
max  { |/  |  ,  FSCALE  ) 


<  GRADTL 


DFAL’LT  returns  the  value  GRADTL  =  machcps1/3.  ( macheps  is  described  in  Section  6.)  If  a  negative 
value  is  specified  for  GRADTL ,  the  default  is  used. 


STEPTL— >  (Optional)  :  A  positive  real  number  containing  the  tolerance  at  which  the  scaled  step  length 
is  considered  close  enough  to  zero  to  terminate  the  algorithm.  The  test  used  in  the  software  is 


max 


I 


|  (x+),  —  (xc )/  | 
max  [TUOTl  .TYPX  [I]) 


<  STEPTL 


where  x ,  and  xc  arc  the  new  and  old  itcrativcs,  respectively.  If  the  value  of  STEPTL  is  too  large,  the 
software  may  terminate  prematurely.  DFAULT  returns  the  value  macheps2,13.  If  a  negative  value  is 
specified  for  SI EPTL ,  the  default  is  used. 
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ITNLIM-*  (Optional) :  A  positive  integer  specifying  the  maximum  number  of  iterations  to  be  performed 
before  the  program  is  terminated.  DFAULT  returns  the  value  100.  If  a  nonpositive  value  is  specified  for 
ITNLIM ,  the  default  is  used. 

STEPMX-*  (Optional)  :  A  positive  real  number  containing  the  maximum  scaled  step  size  allowed  in 
each  iteration.  DFAULT  returns  the  value 

STEP  MX  =  max{  |  |x0|  |2- 103.  103  } 

where  xq  is  the  initial  approximation  provided  by  the  user.  STEPMX  is  used  to  prevent  steps  that  would 
cause  the  optimization  function  to  overflow,  to  prevent  the  algorithm  from  leaving  the  area  of  interest  in 
parameter  space,  or  to  detect  divergence  in  the  algorithm.  STEPMX  should  be  chosen  small  enough  to 
prevent  the  first  two  of  these  occurrences  but  should  be  larger  than  any  anticipated  "reasonable"  step.  The 
algorithm  will  halt  and  provide  a  diagnostic  if  it  attempts  to  exceed  STEPMX  on  five  successive  itera¬ 
tions.  If  a  nonpositive  value  is  specified  for  STEPMX ,  the  default  is  used. 

IPR-»  (Optional)  :  A  positive  integer  containing  the  number  of  the  output  unit.  DFAULT  returns  the 
value  6  which  is  the  standard  FORTRAN  output  unit. 

METHOD—*  (Optional)  :  An  integer  flag  designating  which  method  to  use. 

METHOD  =  0  :  Use  the  standard  (quadratic  model  based)  method. 

METHOD  =  1  :  Use  the  tensor  method. 

DFAULT  returns  the  value  1.  If  a  value  other  than  0  or  1  is  specified  for  METHOD ,  1  is  used. 

GRDFLG—*  (Optional)  :  An  integer  flag  specifying  whether  a  routine  to  calculate  the  analytic  gradient  is 
provided  by  the  user. 

GRDFLG  =  0  :  No  analytic  gradient  supplied  by  user. 

CRDFLG  =  1  :  Analytic  gradient  supplied  by  user  (will  be  checked  against  finite  difference  gra¬ 
dient). 

GRDFLG  =  2  :  Analytic  gradient  supplied  by  user  (will  not  be  checked  against  finite  difference 


gradient). 
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When  GRDFLG  =  0,  the  gradient  values  are  computed  by  finite  differences.  When  GRDFLG  =  1  or  2, 
the  name  of  the  user  supplied  routine  that  evaluates  V/  ( x )  must  be  supplied  in  GRD .  When  GRDFLG  = 
1,  the  program  compares  the  value  of  the  user’s  analytic  gradient  routine  at  xq  with  a  finite  difference 
estimate,  and  aborts  the  program  if  the  relative  difference  between  any  two  components  is  greater  than 
0.01.  DFAULT  returns  the  value  0.  If  a  value  other  than  0, 1,  or  2  is  specified  for  GRDFLG ,  0  is  used. 

HESFLG-*  (Optional) :  An  integer  flag  specifying  whether  a  routine  to  calculate  the  analytic  Hessian  is 
provided  by  the  user. 

HESFLG  =  0 :  No  analytic  Hessian  supplied  by  user. 

HESFLG  =  1  :  Analytic  Hessian  supplied  by  user  (will  be  checked  against  finite  difference  Hes¬ 
sian). 

HESFLG  =  2  :  Analytic  Hessian  supplied  by  user  (will  not  be  checked  against  finite  difference 
Hessian). 

When  HESFLG  =  0,  the  Hessian  values  are  computed  by  finite  differences.  When  HESFLG  =  1  or  2,  the 
name  of  the  user  supplied  routine  that  evaluates  V2/  (*)  must  be  supplied  in  HSN .  When  HESFLG  =  1, 
the  program  compares  the  value  of  the  user’s  analytic  Hessian  routine  at  xq  with  a  finite  difference  esti¬ 
mate,  and  aborts  the  program  if  the  relative  difference  between  any  two  components  is  greater  than  0.01. 
DFAULT  returns  the  value  0.  If  a  value  other  than  0,  1,  or  2  is  specified  for  HESFLG ,  0  is  used. 

NDIGIT— »  (Optional)  :  An  integer  which  estimates  the  number  of  accurate  digits  in  the  objective  func¬ 
tion/  (*).  DFAULT  returns  the  value  -log  io(macheps).  If  a  nonpositivc  value  is  specified,  the  default  is 
used. 

MSG* — >  (Optional)  :  An  integer  which  upon  entering  the  software  package  specifics  the  type  of  printed 
output  to  be  produced,  and  which  upon  exiting  the  software  package  specifics  the  termination  condition. 
The  meaning  of  MSG  upon  input  is  : 

0  :  No  printed  output  will  be  produced. 

1  :  Print  out  the  values  of;t,/0c),  V/  (*)  at  the  initial  and  final  iterates,  die  total  number  of  itera¬ 
tions  that  were  taken,  and  the  reason  the  algorithm  was  terminated. 


-13- 


2  :  Print  out  the  values  of  x,  f  (x),  V/ (jc)  at  every  iteration,  the  total  number  of  iterations  that 
were  taken,  and  the  reason  the  algorithm  was  terminated. 

DFAULT  returns  the  value  1.  If  a  value  other  than  0,  1,  or  2  is  specified  tor  MSG ,  1  is  used. 

The  meaning  of  MSG  upon  exiting  the  software  package  is  : 

-1  :  The  norm  of  the  gradient  at  the  final  iterate  was  less  than  GRADTL . 

-2  :  The  length  of  the  last  step  was  less  than  STEPTL . 

-3  :  The  last  iteration  failed  to  locate  a  lower  point. 

-4  :  The  iteration  limit  has  been  exceeded. 

-5  :  Five  consecutive  steps  of  length  STEPMX  have  been  taken. 

-20  :  Nonpositive  value  of  N  was  input;  program  aborted. 

-21  :  Possible  coding  error  in  analytic  gradient,  because  analytic  gradient  at  xq  is  not  close 
enough  to  finite  difference  approximation;  program  aborted.  (This  check  can  be  overrid¬ 
den  by  setting  GRDFLG  =  2.) 

-22  :  Possible  coding  error  in  analytic  Hessian,  because  analytic  Hessian  at  x0  is  not  close 
enough  to  finite  difference  approximation;  program  aborted.  (This  check  can  be  overrid¬ 
den  by  setting  HESFLG  =  2.) 

XPLS<—  :  An  N-vector  containing  the  final  iterate,  which  is  the  best  approximation  found  to  the  minim- 
izer  of  /  (x). 

FPLS<—  ;  A  real  number  that  contains  the  function  value  at  the  final  iterate  XPLS . 

GPLS<—  :  An  N-vcctor  containing  the  gradient  value  at  the  final  iterate  XPLS . 

H<—  :  An  array  that  is  used  to  store  the  Hessian  matrix  V2/0O  at  each  iteration.  The  user  needs  to 
declare  this  array  to  have  dimension  NRxNC ,  where  NR  is  a  parameter  described  at  the  start  of  this  sec¬ 
tion  and  obeys  NR  >N ,  and  NC  is  any  integer  obeying  NC  >N .  The  Hessian  matrix  is  stored  in  the  first  N 
rows  and  columns  of  H .  Upon  exiting  the  software,  H  contains  the  Hessian  value  at  the  final  iterate 


XPLS . 
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ITNNO<—  :  A  positive  integer  containing  the  total  number  of  iterations  that  were  taken. 

WRK— »  :  The  name  of  an  NR  x8  double  precision  array  used  as  workspace  by  the  software  package.  It 
must  be  declared  in  the  user’s  calling  program  with  these  dimensions. 

IWRK— >  :  The  name  of  an  integer  array  used  as  workspace  by  the  software  package.  It  must  be  declared 
in  the  user’s  calling  program  and  have  dimension  Nl ,  where  Nl  is  any  integer  obeying  Nl  >N . 

6.  Implementation  Details 

The  software  package  is  coded  in  FORTRAN  77  using  double  precision.  The  user  must  declare  all 
parameters  that  arc  real  variables  to  be  double  precision. 

The  software  calculates  the  value  of  the  machine  epsilon,  defined  to  be  the  smallest  positive  real 
number  macheps  for  which  ( 1  +macheps  )  >  1  in  double  precision  on  that  computer,  in  the  subroutine 
MCHEPS .  On  some  computers,  the  returned  value  may  be  incorrect  due  to  compiler  optimizations.  The 
user  may  wish  to  check  the  computer  value  of  macheps  and,  if  it  is  incorrect,  replace  the  code  in  the  sub¬ 
routine  MCHEPS  with  the  statement 

EPS  =  correct  value  of  machine  epsilon. 

Several  components  of  the  software  package  are  taken  from  the  UNCMIN  unconstrained  minimiza¬ 
tion  package  of  Schnabel,  Koontz  and  Weiss  [1985],  These  routines  arc:  FORSLV  and  BAKSLV  (for¬ 
ward  and  backward  triangular  solve),  FSTOFD  and  SNDOFD  (first  and  second  order  finite  difference 
derivatives),  GRDCHK  and  HESCHK  (compare  the  finite  difference  gradient  and  Hessian  to  analytic  gra¬ 
dient  and  Hessian,  respectively),  LNSRCH  (line  search),  OPTSTP  (check  stopping  condition),  and  most 
of  OPTCHK  (check  input  parameters). 
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7.  Summary  of  Test  Results 

We  have  tested  this  software  on  the  set  of  unconstrained  optimization  problems  in  Mord,  Garbow 
and  Hillstrom  [1981],  All  of  these  problems  except  the  Powell  singular  problem  have  V2/  )  nonsingu¬ 

lar.  We  created  two  sets  of  singular  test  problems  V2/  (jc»)  having  rank  n— 1  and  n-2  respectively,  by 
modifying  the  nonsingular  test  problems  of  Mord,  Garbow  and  Hillstrom  [1981]  as  described  in  Schnabel 
and  Chow  [1989].  The  dimension  of  these  problems  ranges  from  2  to  30. 

The  test  results  for  the  problems  solved  successfully  by  both  methods  arc  summarized  in  Table  7.1 
below.  The  second  and  third  columns  are  computed  using  the  total  of  all  the  iterations,  or  all  the  function 
evaluations,  required  by  each  method  to  solve  all  these  problems.  This  is  a  reflection  of  the  cost  of  the 
entire  test  set.  Table  7.1  shows  that,  on  the  average,  the  tensor  method  required  about  28%  to  34%  fewer 
iterations,  and  about  22%  to  36%  fewer  function  evaluations,  to  solve  these  problems.  The  number  of 
iterations  would  be  an  accurate  indication  of  the  time  required  by  the  code  to  solve  a  problem  where  the 
number  of  variables  n  is  not  too  small  and  function  evaluation  is  not  too  expensive,  since  in  this  case  the 
cost  per  iteration  of  each  method  is  nearly  identical  and  is  dominated  by  the  factorization  of  an  nxn  sym¬ 
metric  matrix  at  each  iteration.  The  number  of  function  evaluations,  which  includes  finite  difference 
derivatives,  would  be  an  accurate  indication  of  the  time  required  by  the  code  to  solve  a  problem  where 
function  evaluation  is  expensive,  as  it  is  on  many  practical  problems.  Table  7.1  also  shows  that  the 
efficiency  of  the  tensor  method  was  rarely  worse  than  the  standard  method,  and  usually  better,  on  these 
test  problems. 


Table  7.1  —  Comparison  of  Tensor  and  Standard  Methods 


Test  Set 

■KTiTM  tfsnnsrsi 

mmmZ'Em 

Fn  Evalns,  Tensor 

Fn  Evalns,  Standard 

Tensor  Better 

Tensor  Worse 

Nonsingular 

0.714 

0.779 

32 

4 

10 

Singular,  rank  n-1 

0.658 

0.638 

39 

2 

10 

Singular,  rank  n-2 

0.674 

0.685 

31 

5 

17 
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In  addition,  Table  7.2  shows  that  the  tensor  method  solved  18  problems  that  the  standard  method 
did  not  solve,  while  the  standard  method  solved  3  problems  that  the  tensor  method  did  not  solve,  with  the 
tensor  method  saving  a  greater  advantage  on  the  singular  problems  than  on  the  nonsingular  problems. 

These  results  indicate  that  the  tensor  method  is  likely  to  be  more  efficient  than  the  standard  method 

in  solving  nonsingular  and  singular  unconstrained  optimization  problems,  and  that  it  may  solve  a  wider 

range  of  problems.  They  also  indicate  that  for  any  particular  problem,  it  may  be  advantageous  to  have 
both  methods  available. 


Table  7.2  --Number  of  Test  Problems  Solved  by  One  Method  Only 


Test  Set 

Solved  by  Tensor  Method  Only 

Solved  by  Standard  Method  Only 

Nonsingular 

6 

1 

Singular,  rank  n-\ 

5 

0 

Singular,  rank  n  -2 

7 

2 
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choose  between  a  recently  developed  "tensor  method"  for  unconstrained  optimization,  and  an  analogous 
standard  method  based  upon  a  quadratic  model.  The  tensor  method  bases  each  iteration  upon  a  specially 
constructed  fourth  order  model  of  the  objective  function  that  is  not  significantly  more  expensive  to  fonn, 
store,  or  solve  than  the  standard  quadratic  model.  In  our  experience,  the  tensor  method  requires 
significantly  fewer  iterations  and  function  evaluations  to  solve  most  unconstrained  optimization  problems 
than  standard  methods  based  upon  quadratic  models,  and  also  solves  a  somewhat  wider  range  of  prob¬ 
lems.  For  these  reasons,  it  may  be  a  useful  addition  to  numerical  software  libraries. 
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